Overview

Dataset statistics

Number of variables28
Number of observations5043
Missing cells2695
Missing cells (%)1.9%
Duplicate rows33
Duplicate rows (%)0.7%
Total size in memory4.9 MiB
Average record size in memory1020.3 B

Variable types

Categorical11
Numeric16
URL1

Warnings

Dataset has 33 (0.7%) duplicate rowsDuplicates
director_name has a high cardinality: 2399 distinct values High cardinality
actor_2_name has a high cardinality: 3032 distinct values High cardinality
genres has a high cardinality: 914 distinct values High cardinality
actor_1_name has a high cardinality: 2097 distinct values High cardinality
movie_title has a high cardinality: 4917 distinct values High cardinality
actor_3_name has a high cardinality: 3521 distinct values High cardinality
plot_keywords has a high cardinality: 4760 distinct values High cardinality
country has a high cardinality: 63 distinct values High cardinality
num_critic_for_reviews is highly correlated with num_voted_users and 2 other fieldsHigh correlation
actor_3_facebook_likes is highly correlated with actor_2_facebook_likesHigh correlation
actor_1_facebook_likes is highly correlated with cast_total_facebook_likesHigh correlation
gross is highly correlated with num_voted_users and 1 other fieldsHigh correlation
num_voted_users is highly correlated with num_critic_for_reviews and 3 other fieldsHigh correlation
cast_total_facebook_likes is highly correlated with actor_1_facebook_likes and 1 other fieldsHigh correlation
num_user_for_reviews is highly correlated with num_critic_for_reviews and 2 other fieldsHigh correlation
actor_2_facebook_likes is highly correlated with actor_3_facebook_likes and 1 other fieldsHigh correlation
movie_facebook_likes is highly correlated with num_critic_for_reviews and 1 other fieldsHigh correlation
num_critic_for_reviews is highly correlated with num_voted_users and 1 other fieldsHigh correlation
actor_3_facebook_likes is highly correlated with actor_1_facebook_likes and 2 other fieldsHigh correlation
actor_1_facebook_likes is highly correlated with actor_3_facebook_likes and 2 other fieldsHigh correlation
gross is highly correlated with num_voted_users and 2 other fieldsHigh correlation
num_voted_users is highly correlated with num_critic_for_reviews and 3 other fieldsHigh correlation
cast_total_facebook_likes is highly correlated with actor_3_facebook_likes and 2 other fieldsHigh correlation
num_user_for_reviews is highly correlated with num_critic_for_reviews and 2 other fieldsHigh correlation
budget is highly correlated with gross and 1 other fieldsHigh correlation
actor_2_facebook_likes is highly correlated with actor_3_facebook_likes and 2 other fieldsHigh correlation
num_critic_for_reviews is highly correlated with num_voted_users and 1 other fieldsHigh correlation
actor_3_facebook_likes is highly correlated with cast_total_facebook_likes and 1 other fieldsHigh correlation
actor_1_facebook_likes is highly correlated with cast_total_facebook_likes and 1 other fieldsHigh correlation
gross is highly correlated with num_voted_usersHigh correlation
num_voted_users is highly correlated with num_critic_for_reviews and 2 other fieldsHigh correlation
cast_total_facebook_likes is highly correlated with actor_3_facebook_likes and 2 other fieldsHigh correlation
num_user_for_reviews is highly correlated with num_critic_for_reviews and 1 other fieldsHigh correlation
actor_2_facebook_likes is highly correlated with actor_3_facebook_likes and 2 other fieldsHigh correlation
cast_total_facebook_likes is highly correlated with actor_3_facebook_likes and 2 other fieldsHigh correlation
actor_3_facebook_likes is highly correlated with cast_total_facebook_likes and 2 other fieldsHigh correlation
gross is highly correlated with actor_3_facebook_likes and 3 other fieldsHigh correlation
color is highly correlated with title_yearHigh correlation
title_year is highly correlated with color and 1 other fieldsHigh correlation
duration is highly correlated with country and 3 other fieldsHigh correlation
country is highly correlated with duration and 2 other fieldsHigh correlation
content_rating is highly correlated with title_year and 2 other fieldsHigh correlation
actor_1_facebook_likes is highly correlated with cast_total_facebook_likesHigh correlation
movie_facebook_likes is highly correlated with num_critic_for_reviews and 1 other fieldsHigh correlation
imdb_score is highly correlated with num_user_for_reviews and 1 other fieldsHigh correlation
budget is highly correlated with country and 1 other fieldsHigh correlation
num_user_for_reviews is highly correlated with gross and 3 other fieldsHigh correlation
aspect_ratio is highly correlated with duration and 1 other fieldsHigh correlation
language is highly correlated with duration and 2 other fieldsHigh correlation
num_critic_for_reviews is highly correlated with gross and 3 other fieldsHigh correlation
actor_2_facebook_likes is highly correlated with cast_total_facebook_likesHigh correlation
num_voted_users is highly correlated with actor_3_facebook_likes and 5 other fieldsHigh correlation
language is highly correlated with countryHigh correlation
country is highly correlated with languageHigh correlation
director_name has 103 (2.0%) missing values Missing
director_facebook_likes has 104 (2.1%) missing values Missing
gross has 884 (17.5%) missing values Missing
plot_keywords has 153 (3.0%) missing values Missing
content_rating has 303 (6.0%) missing values Missing
budget has 492 (9.8%) missing values Missing
title_year has 108 (2.1%) missing values Missing
aspect_ratio has 329 (6.5%) missing values Missing
budget is highly skewed (γ1 = 48.15743539) Skewed
movie_title is uniformly distributed Uniform
actor_3_name is uniformly distributed Uniform
plot_keywords is uniformly distributed Uniform
director_facebook_likes has 907 (18.0%) zeros Zeros
actor_3_facebook_likes has 89 (1.8%) zeros Zeros
facenumber_in_poster has 2152 (42.7%) zeros Zeros
actor_2_facebook_likes has 55 (1.1%) zeros Zeros
movie_facebook_likes has 2181 (43.2%) zeros Zeros

Reproduction

Analysis started2021-09-08 08:12:40.314092
Analysis finished2021-09-08 08:13:27.199459
Duration46.89 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

color
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing19
Missing (%)0.4%
Memory size307.2 KiB
Color
4815 
Black and White
 
209

Length

Max length16
Median length5
Mean length5.457603503
Min length5

Characters and Unicode

Total characters27419
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowColor
2nd rowColor
3rd rowColor
4th rowColor
5th rowColor

Common Values

ValueCountFrequency (%)
Color4815
95.5%
Black and White209
 
4.1%
(Missing)19
 
0.4%

Length

2021-09-08T16:13:27.468771image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-08T16:13:27.546879image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
color4815
88.5%
white209
 
3.8%
and209
 
3.8%
black209
 
3.8%

Most occurring characters

ValueCountFrequency (%)
o9630
35.1%
l5024
18.3%
C4815
17.6%
r4815
17.6%
627
 
2.3%
a418
 
1.5%
B209
 
0.8%
c209
 
0.8%
k209
 
0.8%
n209
 
0.8%
Other values (6)1254
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter21559
78.6%
Uppercase Letter5233
 
19.1%
Space Separator627
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o9630
44.7%
l5024
23.3%
r4815
22.3%
a418
 
1.9%
c209
 
1.0%
k209
 
1.0%
n209
 
1.0%
d209
 
1.0%
h209
 
1.0%
i209
 
1.0%
Other values (2)418
 
1.9%
Uppercase Letter
ValueCountFrequency (%)
C4815
92.0%
B209
 
4.0%
W209
 
4.0%
Space Separator
ValueCountFrequency (%)
627
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin26792
97.7%
Common627
 
2.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o9630
35.9%
l5024
18.8%
C4815
18.0%
r4815
18.0%
a418
 
1.6%
B209
 
0.8%
c209
 
0.8%
k209
 
0.8%
n209
 
0.8%
d209
 
0.8%
Other values (5)1045
 
3.9%
Common
ValueCountFrequency (%)
627
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII27419
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o9630
35.1%
l5024
18.3%
C4815
17.6%
r4815
17.6%
627
 
2.3%
a418
 
1.5%
B209
 
0.8%
c209
 
0.8%
k209
 
0.8%
n209
 
0.8%
Other values (6)1254
 
4.6%

director_name
Categorical

HIGH CARDINALITY
MISSING

Distinct2399
Distinct (%)48.6%
Missing103
Missing (%)2.0%
Memory size344.2 KiB
Steven Spielberg
 
26
Woody Allen
 
22
Martin Scorsese
 
20
Clint Eastwood
 
20
Ridley Scott
 
17
Other values (2394)
4835 

Length

Max length32
Median length13
Mean length13.08421053
Min length3

Characters and Unicode

Total characters64636
Distinct characters76
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1505 ?
Unique (%)30.5%

Sample

1st rowTara Subkoff
2nd rowJaume Balagueró
3rd rowJaume Balagueró
4th rowDan Trachtenberg
5th rowTimothy Hines

Common Values

ValueCountFrequency (%)
Steven Spielberg26
 
0.5%
Woody Allen22
 
0.4%
Martin Scorsese20
 
0.4%
Clint Eastwood20
 
0.4%
Ridley Scott17
 
0.3%
Spike Lee16
 
0.3%
Tim Burton16
 
0.3%
Steven Soderbergh16
 
0.3%
Renny Harlin15
 
0.3%
Oliver Stone14
 
0.3%
Other values (2389)4758
94.3%
(Missing)103
 
2.0%

Length

2021-09-08T16:13:27.796822image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john180
 
1.8%
david150
 
1.5%
michael127
 
1.2%
james87
 
0.8%
peter85
 
0.8%
robert84
 
0.8%
paul81
 
0.8%
richard80
 
0.8%
scott65
 
0.6%
lee58
 
0.6%
Other values (2967)9279
90.3%

Most occurring characters

ValueCountFrequency (%)
e6098
 
9.4%
5336
 
8.3%
a5279
 
8.2%
n4658
 
7.2%
r4449
 
6.9%
o3795
 
5.9%
i3693
 
5.7%
l2970
 
4.6%
t2321
 
3.6%
s2089
 
3.2%
Other values (66)23948
37.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter48458
75.0%
Uppercase Letter10495
 
16.2%
Space Separator5336
 
8.3%
Other Punctuation260
 
0.4%
Dash Punctuation87
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e6098
12.6%
a5279
10.9%
n4658
9.6%
r4449
 
9.2%
o3795
 
7.8%
i3693
 
7.6%
l2970
 
6.1%
t2321
 
4.8%
s2089
 
4.3%
h1851
 
3.8%
Other values (31)11255
23.2%
Uppercase Letter
ValueCountFrequency (%)
S999
 
9.5%
J925
 
8.8%
M886
 
8.4%
R758
 
7.2%
C712
 
6.8%
B678
 
6.5%
D619
 
5.9%
A569
 
5.4%
L499
 
4.8%
P488
 
4.6%
Other values (21)3362
32.0%
Other Punctuation
ValueCountFrequency (%)
.239
91.9%
'21
 
8.1%
Space Separator
ValueCountFrequency (%)
5336
100.0%
Dash Punctuation
ValueCountFrequency (%)
-87
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin58953
91.2%
Common5683
 
8.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e6098
 
10.3%
a5279
 
9.0%
n4658
 
7.9%
r4449
 
7.5%
o3795
 
6.4%
i3693
 
6.3%
l2970
 
5.0%
t2321
 
3.9%
s2089
 
3.5%
h1851
 
3.1%
Other values (62)21750
36.9%
Common
ValueCountFrequency (%)
5336
93.9%
.239
 
4.2%
-87
 
1.5%
'21
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII64494
99.8%
Latin 1 Sup142
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e6098
 
9.5%
5336
 
8.3%
a5279
 
8.2%
n4658
 
7.2%
r4449
 
6.9%
o3795
 
5.9%
i3693
 
5.7%
l2970
 
4.6%
t2321
 
3.6%
s2089
 
3.2%
Other values (46)23806
36.9%
Latin 1 Sup
ValueCountFrequency (%)
é45
31.7%
á19
13.4%
ó16
 
11.3%
ö16
 
11.3%
í8
 
5.6%
ñ7
 
4.9%
å6
 
4.2%
ç5
 
3.5%
É3
 
2.1%
ä2
 
1.4%
Other values (10)15
 
10.6%

num_critic_for_reviews
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct528
Distinct (%)10.6%
Missing50
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean140.194272
Minimum1
Maximum813
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:27.937413image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile9
Q150
median110
Q3195
95-th percentile387
Maximum813
Range812
Interquartile range (IQR)145

Descriptive statistics

Standard deviation121.6016754
Coefficient of variation (CV)0.8673797701
Kurtosis2.91341641
Mean140.194272
Median Absolute Deviation (MAD)68
Skewness1.5165327
Sum699990
Variance14786.96746
MonotonicityNot monotonic
2021-09-08T16:13:28.078005image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
143
 
0.9%
937
 
0.7%
536
 
0.7%
1035
 
0.7%
835
 
0.7%
1234
 
0.7%
1633
 
0.7%
8133
 
0.7%
4331
 
0.6%
2930
 
0.6%
Other values (518)4646
92.1%
(Missing)50
 
1.0%
ValueCountFrequency (%)
143
0.9%
226
0.5%
324
0.5%
429
0.6%
536
0.7%
628
0.6%
723
0.5%
835
0.7%
937
0.7%
1035
0.7%
ValueCountFrequency (%)
8131
< 0.1%
7751
< 0.1%
7651
< 0.1%
7502
< 0.1%
7391
< 0.1%
7381
< 0.1%
7331
< 0.1%
7231
< 0.1%
7121
< 0.1%
7032
< 0.1%

duration
Real number (ℝ≥0)

HIGH CORRELATION

Distinct192
Distinct (%)3.8%
Missing13
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean107.2427435
Minimum7
Maximum511
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:28.218597image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile81
Q193
median103
Q3118
95-th percentile146
Maximum511
Range504
Interquartile range (IQR)25

Descriptive statistics

Standard deviation25.49736947
Coefficient of variation (CV)0.237753797
Kurtosis23.93554408
Mean107.2427435
Median Absolute Deviation (MAD)12
Skewness2.489825518
Sum539431
Variance650.1158501
MonotonicityNot monotonic
2021-09-08T16:13:28.359189image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90161
 
3.2%
100141
 
2.8%
101139
 
2.8%
98135
 
2.7%
97131
 
2.6%
93129
 
2.6%
99124
 
2.5%
95124
 
2.5%
94124
 
2.5%
96113
 
2.2%
Other values (182)3709
73.5%
ValueCountFrequency (%)
72
 
< 0.1%
111
 
< 0.1%
141
 
< 0.1%
201
 
< 0.1%
227
0.1%
232
 
< 0.1%
242
 
< 0.1%
254
0.1%
271
 
< 0.1%
281
 
< 0.1%
ValueCountFrequency (%)
5111
< 0.1%
3791
< 0.1%
3341
< 0.1%
3301
< 0.1%
3251
< 0.1%
3001
< 0.1%
2931
< 0.1%
2891
< 0.1%
2861
< 0.1%
2801
< 0.1%

director_facebook_likes
Real number (ℝ≥0)

MISSING
ZEROS

Distinct435
Distinct (%)8.8%
Missing104
Missing (%)2.1%
Infinite0
Infinite (%)0.0%
Mean686.5092124
Minimum0
Maximum23000
Zeros907
Zeros (%)18.0%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:28.484160image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q17
median49
Q3194.5
95-th percentile973
Maximum23000
Range23000
Interquartile range (IQR)187.5

Descriptive statistics

Standard deviation2813.328607
Coefficient of variation (CV)4.098020181
Kurtosis27.25628935
Mean686.5092124
Median Absolute Deviation (MAD)49
Skewness5.22970117
Sum3390669
Variance7914817.85
MonotonicityNot monotonic
2021-09-08T16:13:28.609131image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0907
 
18.0%
370
 
1.4%
666
 
1.3%
764
 
1.3%
263
 
1.2%
460
 
1.2%
1159
 
1.2%
1053
 
1.1%
852
 
1.0%
552
 
1.0%
Other values (425)3493
69.3%
(Missing)104
 
2.1%
ValueCountFrequency (%)
0907
18.0%
263
 
1.2%
370
 
1.4%
460
 
1.2%
552
 
1.0%
666
 
1.3%
764
 
1.3%
852
 
1.0%
949
 
1.0%
1053
 
1.1%
ValueCountFrequency (%)
230001
 
< 0.1%
220008
 
0.2%
2100010
 
0.2%
200001
 
< 0.1%
180004
 
0.1%
1700020
0.4%
1600028
0.6%
150002
 
< 0.1%
1400030
0.6%
1300026
0.5%

actor_3_facebook_likes
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct906
Distinct (%)18.0%
Missing23
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean645.009761
Minimum0
Maximum23000
Zeros89
Zeros (%)1.8%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:28.734102image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10
Q1133
median371.5
Q3636
95-th percentile1000
Maximum23000
Range23000
Interquartile range (IQR)503

Descriptive statistics

Standard deviation1665.041728
Coefficient of variation (CV)2.581420979
Kurtosis60.56388811
Mean645.009761
Median Absolute Deviation (MAD)248.5
Skewness7.279020793
Sum3237949
Variance2772363.957
MonotonicityNot monotonic
2021-09-08T16:13:28.859072image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1000126
 
2.5%
089
 
1.8%
1100029
 
0.6%
328
 
0.6%
200027
 
0.5%
300026
 
0.5%
82622
 
0.4%
221
 
0.4%
721
 
0.4%
421
 
0.4%
Other values (896)4610
91.4%
(Missing)23
 
0.5%
ValueCountFrequency (%)
089
1.8%
221
 
0.4%
328
 
0.6%
421
 
0.4%
518
 
0.4%
618
 
0.4%
721
 
0.4%
817
 
0.3%
916
 
0.3%
1012
 
0.2%
ValueCountFrequency (%)
230002
 
< 0.1%
200001
 
< 0.1%
190005
 
0.1%
170001
 
< 0.1%
160003
 
0.1%
150001
 
< 0.1%
140006
 
0.1%
130005
 
0.1%
120008
 
0.2%
1100029
0.6%

actor_2_name
Categorical

HIGH CARDINALITY

Distinct3032
Distinct (%)60.3%
Missing13
Missing (%)0.3%
Memory size347.4 KiB
Morgan Freeman
 
20
Charlize Theron
 
15
Brad Pitt
 
14
James Franco
 
11
Meryl Streep
 
11
Other values (3027)
4959 

Length

Max length28
Median length13
Mean length13.07435388
Min length3

Characters and Unicode

Total characters65764
Distinct characters80
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2089 ?
Unique (%)41.5%

Sample

1st rowBalthazar Getty
2nd rowPablo Rosso
3rd rowPablo Rosso
4th rowJohn Gallagher Jr.
5th rowKelly LeBrock

Common Values

ValueCountFrequency (%)
Morgan Freeman20
 
0.4%
Charlize Theron15
 
0.3%
Brad Pitt14
 
0.3%
James Franco11
 
0.2%
Meryl Streep11
 
0.2%
Jason Flemyng10
 
0.2%
Adam Sandler10
 
0.2%
Scott Glenn9
 
0.2%
Steve Buscemi9
 
0.2%
Judy Greer9
 
0.2%
Other values (3022)4912
97.4%
(Missing)13
 
0.3%

Length

2021-09-08T16:13:29.171499image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
michael102
 
1.0%
david60
 
0.6%
john56
 
0.5%
james53
 
0.5%
scott52
 
0.5%
tom50
 
0.5%
jason44
 
0.4%
robert44
 
0.4%
kevin41
 
0.4%
thomas39
 
0.4%
Other values (3825)9861
94.8%

Most occurring characters

ValueCountFrequency (%)
e6221
 
9.5%
a5930
 
9.0%
5372
 
8.2%
n4762
 
7.2%
r4398
 
6.7%
i4018
 
6.1%
o3645
 
5.5%
l3420
 
5.2%
t2348
 
3.6%
s2160
 
3.3%
Other values (70)23490
35.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter49447
75.2%
Uppercase Letter10686
 
16.2%
Space Separator5372
 
8.2%
Other Punctuation189
 
0.3%
Dash Punctuation64
 
0.1%
Decimal Number6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e6221
12.6%
a5930
12.0%
n4762
9.6%
r4398
8.9%
i4018
 
8.1%
o3645
 
7.4%
l3420
 
6.9%
t2348
 
4.7%
s2160
 
4.4%
h1796
 
3.6%
Other values (38)10749
21.7%
Uppercase Letter
ValueCountFrequency (%)
M999
 
9.3%
S821
 
7.7%
C815
 
7.6%
B773
 
7.2%
J770
 
7.2%
D668
 
6.3%
A640
 
6.0%
R592
 
5.5%
L511
 
4.8%
T463
 
4.3%
Other values (16)3634
34.0%
Other Punctuation
ValueCountFrequency (%)
.124
65.6%
'65
34.4%
Decimal Number
ValueCountFrequency (%)
53
50.0%
03
50.0%
Space Separator
ValueCountFrequency (%)
5372
100.0%
Dash Punctuation
ValueCountFrequency (%)
-64
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin60133
91.4%
Common5631
 
8.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e6221
 
10.3%
a5930
 
9.9%
n4762
 
7.9%
r4398
 
7.3%
i4018
 
6.7%
o3645
 
6.1%
l3420
 
5.7%
t2348
 
3.9%
s2160
 
3.6%
h1796
 
3.0%
Other values (64)21435
35.6%
Common
ValueCountFrequency (%)
5372
95.4%
.124
 
2.2%
'65
 
1.2%
-64
 
1.1%
53
 
0.1%
03
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII65642
99.8%
Latin 1 Sup122
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e6221
 
9.5%
a5930
 
9.0%
5372
 
8.2%
n4762
 
7.3%
r4398
 
6.7%
i4018
 
6.1%
o3645
 
5.6%
l3420
 
5.2%
t2348
 
3.6%
s2160
 
3.3%
Other values (48)23368
35.6%
Latin 1 Sup
ValueCountFrequency (%)
é43
35.2%
í14
 
11.5%
á10
 
8.2%
ë8
 
6.6%
ó6
 
4.9%
ø6
 
4.9%
å5
 
4.1%
ü4
 
3.3%
ö3
 
2.5%
û3
 
2.5%
Other values (12)20
16.4%

actor_1_facebook_likes
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct878
Distinct (%)17.4%
Missing7
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean6560.047061
Minimum0
Maximum640000
Zeros26
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:29.312091image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile95.5
Q1614
median988
Q311000
95-th percentile24000
Maximum640000
Range640000
Interquartile range (IQR)10386

Descriptive statistics

Standard deviation15020.75912
Coefficient of variation (CV)2.289733439
Kurtosis683.5473559
Mean6560.047061
Median Absolute Deviation (MAD)752.5
Skewness19.12177638
Sum33036397
Variance225623204.5
MonotonicityNot monotonic
2021-09-08T16:13:29.437062image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1000449
 
8.9%
11000211
 
4.2%
2000197
 
3.9%
3000155
 
3.1%
12000135
 
2.7%
13000127
 
2.5%
14000123
 
2.4%
10000112
 
2.2%
18000109
 
2.2%
2200082
 
1.6%
Other values (868)3336
66.2%
ValueCountFrequency (%)
026
0.5%
28
 
0.2%
34
 
0.1%
42
 
< 0.1%
57
 
0.1%
63
 
0.1%
73
 
0.1%
81
 
< 0.1%
93
 
0.1%
101
 
< 0.1%
ValueCountFrequency (%)
6400001
 
< 0.1%
2600003
 
0.1%
1640002
 
< 0.1%
1370002
 
< 0.1%
870008
 
0.2%
770001
 
< 0.1%
4900027
0.5%
460001
 
< 0.1%
450005
 
0.1%
440002
 
< 0.1%

gross
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4035
Distinct (%)97.0%
Missing884
Missing (%)17.5%
Infinite0
Infinite (%)0.0%
Mean48468407.53
Minimum162
Maximum760505847
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:29.577655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum162
5-th percentile99034
Q15340987.5
median25517500
Q362309437.5
95-th percentile180029729.4
Maximum760505847
Range760505685
Interquartile range (IQR)56968450

Descriptive statistics

Standard deviation68452990.44
Coefficient of variation (CV)1.412321839
Kurtosis14.86886885
Mean48468407.53
Median Absolute Deviation (MAD)23241132
Skewness3.127203838
Sum2.015801069 × 1011
Variance4.6858119 × 1015
MonotonicityNot monotonic
2021-09-08T16:13:29.718246image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
470000003
 
0.1%
30000003
 
0.1%
1445123103
 
0.1%
349648183
 
0.1%
57735193
 
0.1%
1773436753
 
0.1%
80000003
 
0.1%
2180512603
 
0.1%
20000002
 
< 0.1%
357990262
 
< 0.1%
Other values (4025)4131
81.9%
(Missing)884
 
17.5%
ValueCountFrequency (%)
1621
< 0.1%
7031
< 0.1%
7211
< 0.1%
7281
< 0.1%
8281
< 0.1%
11111
< 0.1%
13321
< 0.1%
15211
< 0.1%
17111
< 0.1%
22451
< 0.1%
ValueCountFrequency (%)
7605058471
< 0.1%
6586723021
< 0.1%
6521772711
< 0.1%
6232795472
< 0.1%
5333160611
< 0.1%
4745446771
< 0.1%
4609356651
< 0.1%
4589915991
< 0.1%
4481306421
< 0.1%
4364710361
< 0.1%

genres
Categorical

HIGH CARDINALITY

Distinct914
Distinct (%)18.1%
Missing0
Missing (%)0.0%
Memory size380.9 KiB
Drama
 
236
Comedy
 
209
Comedy|Drama
 
191
Comedy|Drama|Romance
 
187
Comedy|Romance
 
158
Other values (909)
4062 

Length

Max length64
Median length20
Mean length20.31310728
Min length5

Characters and Unicode

Total characters102439
Distinct characters35
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique495 ?
Unique (%)9.8%

Sample

1st rowDrama|Horror|Mystery|Thriller
2nd rowHorror
3rd rowHorror
4th rowDrama|Horror|Mystery|Sci-Fi|Thriller
5th rowDrama

Common Values

ValueCountFrequency (%)
Drama236
 
4.7%
Comedy209
 
4.1%
Comedy|Drama191
 
3.8%
Comedy|Drama|Romance187
 
3.7%
Comedy|Romance158
 
3.1%
Drama|Romance152
 
3.0%
Crime|Drama|Thriller101
 
2.0%
Horror71
 
1.4%
Action|Crime|Drama|Thriller68
 
1.3%
Action|Crime|Thriller65
 
1.3%
Other values (904)3605
71.5%

Length

2021-09-08T16:13:30.391058image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drama236
 
4.7%
comedy209
 
4.1%
comedy|drama191
 
3.8%
comedy|drama|romance187
 
3.7%
comedy|romance158
 
3.1%
drama|romance152
 
3.0%
crime|drama|thriller101
 
2.0%
horror71
 
1.4%
action|crime|drama|thriller68
 
1.3%
action|crime|thriller65
 
1.3%
Other values (904)3605
71.5%

Most occurring characters

ValueCountFrequency (%)
r10547
 
10.3%
|9461
 
9.2%
a9065
 
8.8%
e7946
 
7.8%
m7378
 
7.2%
i6575
 
6.4%
o6319
 
6.2%
y4651
 
4.5%
n4495
 
4.4%
t4042
 
3.9%
Other values (25)31960
31.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter77222
75.4%
Uppercase Letter15131
 
14.8%
Math Symbol9461
 
9.2%
Dash Punctuation625
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r10547
13.7%
a9065
11.7%
e7946
10.3%
m7378
9.6%
i6575
8.5%
o6319
8.2%
y4651
 
6.0%
n4495
 
5.8%
t4042
 
5.2%
l3508
 
4.5%
Other values (9)12696
16.4%
Uppercase Letter
ValueCountFrequency (%)
C2761
18.2%
D2715
17.9%
A2318
15.3%
F1778
11.8%
T1413
9.3%
R1109
7.3%
M846
 
5.6%
S804
 
5.3%
H772
 
5.1%
W310
 
2.0%
Other values (4)305
 
2.0%
Math Symbol
ValueCountFrequency (%)
|9461
100.0%
Dash Punctuation
ValueCountFrequency (%)
-625
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin92353
90.2%
Common10086
 
9.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
r10547
 
11.4%
a9065
 
9.8%
e7946
 
8.6%
m7378
 
8.0%
i6575
 
7.1%
o6319
 
6.8%
y4651
 
5.0%
n4495
 
4.9%
t4042
 
4.4%
l3508
 
3.8%
Other values (23)27827
30.1%
Common
ValueCountFrequency (%)
|9461
93.8%
-625
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII102439
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r10547
 
10.3%
|9461
 
9.2%
a9065
 
8.8%
e7946
 
7.8%
m7378
 
7.2%
i6575
 
6.4%
o6319
 
6.2%
y4651
 
4.5%
n4495
 
4.4%
t4042
 
3.9%
Other values (25)31960
31.2%

actor_1_name
Categorical

HIGH CARDINALITY

Distinct2097
Distinct (%)41.6%
Missing7
Missing (%)0.1%
Memory size347.3 KiB
Robert De Niro
 
49
Johnny Depp
 
41
Nicolas Cage
 
33
J.K. Simmons
 
31
Matt Damon
 
30
Other values (2092)
4852 

Length

Max length27
Median length13
Mean length13.19241461
Min length4

Characters and Unicode

Total characters66437
Distinct characters76
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1360 ?
Unique (%)27.0%

Sample

1st rowTimothy Hutton
2nd rowJonathan D. Mellor
3rd rowManuela Velasco
4th rowBradley Cooper
5th rowChristopher Lambert

Common Values

ValueCountFrequency (%)
Robert De Niro49
 
1.0%
Johnny Depp41
 
0.8%
Nicolas Cage33
 
0.7%
J.K. Simmons31
 
0.6%
Matt Damon30
 
0.6%
Bruce Willis30
 
0.6%
Denzel Washington30
 
0.6%
Liam Neeson29
 
0.6%
Harrison Ford27
 
0.5%
Robin Williams27
 
0.5%
Other values (2087)4709
93.4%

Length

2021-09-08T16:13:30.672242image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
robert109
 
1.0%
tom93
 
0.9%
michael89
 
0.9%
jason59
 
0.6%
de57
 
0.5%
james54
 
0.5%
bruce51
 
0.5%
steve50
 
0.5%
jr49
 
0.5%
niro49
 
0.5%
Other values (2888)9784
93.7%

Most occurring characters

ValueCountFrequency (%)
e6213
 
9.4%
a5732
 
8.6%
5408
 
8.1%
n4818
 
7.3%
r4311
 
6.5%
i4249
 
6.4%
o3918
 
5.9%
l3312
 
5.0%
t2569
 
3.9%
s2349
 
3.5%
Other values (66)23558
35.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter50016
75.3%
Uppercase Letter10711
 
16.1%
Space Separator5408
 
8.1%
Other Punctuation227
 
0.3%
Dash Punctuation73
 
0.1%
Decimal Number2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e6213
12.4%
a5732
11.5%
n4818
9.6%
r4311
8.6%
i4249
 
8.5%
o3918
 
7.8%
l3312
 
6.6%
t2569
 
5.1%
s2349
 
4.7%
h1791
 
3.6%
Other values (32)10754
21.5%
Uppercase Letter
ValueCountFrequency (%)
J954
 
8.9%
M912
 
8.5%
S853
 
8.0%
C818
 
7.6%
B741
 
6.9%
D728
 
6.8%
R635
 
5.9%
H524
 
4.9%
A499
 
4.7%
L490
 
4.6%
Other values (18)3557
33.2%
Other Punctuation
ValueCountFrequency (%)
.179
78.9%
'48
 
21.1%
Decimal Number
ValueCountFrequency (%)
51
50.0%
01
50.0%
Space Separator
ValueCountFrequency (%)
5408
100.0%
Dash Punctuation
ValueCountFrequency (%)
-73
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin60727
91.4%
Common5710
 
8.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e6213
 
10.2%
a5732
 
9.4%
n4818
 
7.9%
r4311
 
7.1%
i4249
 
7.0%
o3918
 
6.5%
l3312
 
5.5%
t2569
 
4.2%
s2349
 
3.9%
h1791
 
2.9%
Other values (60)21465
35.3%
Common
ValueCountFrequency (%)
5408
94.7%
.179
 
3.1%
-73
 
1.3%
'48
 
0.8%
51
 
< 0.1%
01
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII66357
99.9%
Latin 1 Sup80
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e6213
 
9.4%
a5732
 
8.6%
5408
 
8.1%
n4818
 
7.3%
r4311
 
6.5%
i4249
 
6.4%
o3918
 
5.9%
l3312
 
5.0%
t2569
 
3.9%
s2349
 
3.5%
Other values (48)23478
35.4%
Latin 1 Sup
ValueCountFrequency (%)
é20
25.0%
ë15
18.8%
á7
 
8.8%
í6
 
7.5%
å5
 
6.2%
ç5
 
6.2%
ø4
 
5.0%
Ó3
 
3.8%
ü2
 
2.5%
Á2
 
2.5%
Other values (8)11
13.8%

movie_title
Categorical

HIGH CARDINALITY
UNIFORM

Distinct4917
Distinct (%)97.5%
Missing0
Missing (%)0.0%
Memory size357.9 KiB
King Kong
 
3
Halloween
 
3
Pan
 
3
Victor Frankenstein
 
3
Home
 
3
Other values (4912)
5028 

Length

Max length86
Median length14
Mean length15.54987111
Min length1

Characters and Unicode

Total characters78418
Distinct characters96
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4798 ?
Unique (%)95.1%

Sample

1st row#Horror
2nd row[Rec] 2
3rd row[Rec]
4th row10 Cloverfield Lane
5th row10 Days in a Madhouse

Common Values

ValueCountFrequency (%)
King Kong3
 
0.1%
Halloween3
 
0.1%
Pan3
 
0.1%
Victor Frankenstein3
 
0.1%
Home3
 
0.1%
Ben-Hur3
 
0.1%
The Fast and the Furious3
 
0.1%
The Full Monty2
 
< 0.1%
Dawn of the Dead2
 
< 0.1%
The Jungle Book2
 
< 0.1%
Other values (4907)5016
99.5%

Length

2021-09-08T16:13:30.953428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the1606
 
11.5%
of483
 
3.5%
a193
 
1.4%
and150
 
1.1%
in123
 
0.9%
to107
 
0.8%
2104
 
0.7%
81
 
0.6%
man66
 
0.5%
love56
 
0.4%
Other values (4905)10987
78.7%

Most occurring characters

ValueCountFrequency (%)
10209
 
13.0%
e7898
 
10.1%
a4859
 
6.2%
o4669
 
6.0%
n4141
 
5.3%
r4135
 
5.3%
i3933
 
5.0%
t3818
 
4.9%
s3007
 
3.8%
h2975
 
3.8%
Other values (86)28774
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter54383
69.4%
Uppercase Letter12232
 
15.6%
Space Separator10209
 
13.0%
Other Punctuation952
 
1.2%
Decimal Number527
 
0.7%
Dash Punctuation95
 
0.1%
Open Punctuation5
 
< 0.1%
Close Punctuation5
 
< 0.1%
Currency Symbol4
 
< 0.1%
Other Number2
 
< 0.1%
Other values (3)4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e7898
14.5%
a4859
 
8.9%
o4669
 
8.6%
n4141
 
7.6%
r4135
 
7.6%
i3933
 
7.2%
t3818
 
7.0%
s3007
 
5.5%
h2975
 
5.5%
l2538
 
4.7%
Other values (25)12410
22.8%
Uppercase Letter
ValueCountFrequency (%)
T1724
14.1%
S1054
 
8.6%
M821
 
6.7%
B778
 
6.4%
D727
 
5.9%
C687
 
5.6%
A664
 
5.4%
L580
 
4.7%
H569
 
4.7%
W505
 
4.1%
Other values (17)4123
33.7%
Other Punctuation
ValueCountFrequency (%)
:371
39.0%
'231
24.3%
.145
 
15.2%
,79
 
8.3%
&61
 
6.4%
!32
 
3.4%
?16
 
1.7%
/8
 
0.8%
*5
 
0.5%
#2
 
0.2%
Other values (2)2
 
0.2%
Decimal Number
ValueCountFrequency (%)
2147
27.9%
087
16.5%
387
16.5%
182
15.6%
435
 
6.6%
822
 
4.2%
521
 
4.0%
917
 
3.2%
715
 
2.8%
614
 
2.7%
Open Punctuation
ValueCountFrequency (%)
(3
60.0%
[2
40.0%
Close Punctuation
ValueCountFrequency (%)
)3
60.0%
]2
40.0%
Currency Symbol
ValueCountFrequency (%)
$2
50.0%
¢2
50.0%
Space Separator
ValueCountFrequency (%)
10209
100.0%
Other Number
ValueCountFrequency (%)
½2
100.0%
Other Symbol
ValueCountFrequency (%)
°1
100.0%
Dash Punctuation
ValueCountFrequency (%)
-95
100.0%
Math Symbol
ValueCountFrequency (%)
+2
100.0%
Connector Punctuation
ValueCountFrequency (%)
_1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin66615
84.9%
Common11803
 
15.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e7898
 
11.9%
a4859
 
7.3%
o4669
 
7.0%
n4141
 
6.2%
r4135
 
6.2%
i3933
 
5.9%
t3818
 
5.7%
s3007
 
4.5%
h2975
 
4.5%
l2538
 
3.8%
Other values (52)24642
37.0%
Common
ValueCountFrequency (%)
10209
86.5%
:371
 
3.1%
'231
 
2.0%
2147
 
1.2%
.145
 
1.2%
-95
 
0.8%
087
 
0.7%
387
 
0.7%
182
 
0.7%
,79
 
0.7%
Other values (24)270
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII78395
> 99.9%
Latin 1 Sup23
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10209
 
13.0%
e7898
 
10.1%
a4859
 
6.2%
o4669
 
6.0%
n4141
 
5.3%
r4135
 
5.3%
i3933
 
5.0%
t3818
 
4.9%
s3007
 
3.8%
h2975
 
3.8%
Other values (72)28751
36.7%
Latin 1 Sup
ValueCountFrequency (%)
é8
34.8%
½2
 
8.7%
¢2
 
8.7%
Æ1
 
4.3%
°1
 
4.3%
ü1
 
4.3%
í1
 
4.3%
ó1
 
4.3%
à1
 
4.3%
ä1
 
4.3%
Other values (4)4
17.4%

num_voted_users
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4826
Distinct (%)95.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean83668.16082
Minimum5
Maximum1689764
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:31.093295image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile514.6
Q18593.5
median34359
Q396309
95-th percentile332254.9
Maximum1689764
Range1689759
Interquartile range (IQR)87715.5

Descriptive statistics

Standard deviation138485.2568
Coefficient of variation (CV)1.655172714
Kurtosis24.44552017
Mean83668.16082
Median Absolute Deviation (MAD)30816
Skewness4.029871144
Sum421938535
Variance1.917816635 × 1010
MonotonicityNot monotonic
2021-09-08T16:13:31.236033image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
575
 
0.1%
64
 
0.1%
3743
 
0.1%
623
 
0.1%
533
 
0.1%
25413
 
0.1%
383
 
0.1%
60253
 
0.1%
1623
 
0.1%
31193
 
0.1%
Other values (4816)5010
99.3%
ValueCountFrequency (%)
52
< 0.1%
64
0.1%
72
< 0.1%
83
0.1%
101
 
< 0.1%
131
 
< 0.1%
152
< 0.1%
161
 
< 0.1%
182
< 0.1%
191
 
< 0.1%
ValueCountFrequency (%)
16897641
< 0.1%
16761691
< 0.1%
14682001
< 0.1%
13474611
< 0.1%
13246801
< 0.1%
12512221
< 0.1%
12387461
< 0.1%
12177521
< 0.1%
12157181
< 0.1%
11557701
< 0.1%

cast_total_facebook_likes
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3978
Distinct (%)78.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9699.063851
Minimum0
Maximum656730
Zeros33
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:31.378332image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile179
Q11411
median3090
Q313756.5
95-th percentile36927.7
Maximum656730
Range656730
Interquartile range (IQR)12345.5

Descriptive statistics

Standard deviation18163.79912
Coefficient of variation (CV)1.872737349
Kurtosis361.2551153
Mean9699.063851
Median Absolute Deviation (MAD)2302
Skewness12.83192773
Sum48912379
Variance329923598.6
MonotonicityNot monotonic
2021-09-08T16:13:31.508668image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
033
 
0.7%
57
 
0.1%
20206
 
0.1%
26
 
0.1%
6735
 
0.1%
10445
 
0.1%
295
 
0.1%
6794
 
0.1%
154
 
0.1%
814
 
0.1%
Other values (3968)4964
98.4%
ValueCountFrequency (%)
033
0.7%
26
 
0.1%
31
 
< 0.1%
42
 
< 0.1%
57
 
0.1%
62
 
< 0.1%
71
 
< 0.1%
82
 
< 0.1%
101
 
< 0.1%
112
 
< 0.1%
ValueCountFrequency (%)
6567301
< 0.1%
3037171
< 0.1%
2839391
< 0.1%
2635841
< 0.1%
2618181
< 0.1%
1701181
< 0.1%
1402681
< 0.1%
1377121
< 0.1%
1207971
< 0.1%
1080161
< 0.1%

actor_3_name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct3521
Distinct (%)70.1%
Missing23
Missing (%)0.5%
Memory size347.3 KiB
Steve Coogan
 
8
Ben Mendelsohn
 
8
John Heard
 
8
Robert Duvall
 
7
Stephen Root
 
7
Other values (3516)
4982 

Length

Max length29
Median length13
Mean length13.08227092
Min length3

Characters and Unicode

Total characters65673
Distinct characters81
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2648 ?
Unique (%)52.7%

Sample

1st rowLydia Hearst
2nd rowAndrea Ros
3rd rowCarlos Lasarte
4th rowSumalee Montano
5th rowAlexandra Callas

Common Values

ValueCountFrequency (%)
Steve Coogan8
 
0.2%
Ben Mendelsohn8
 
0.2%
John Heard8
 
0.2%
Robert Duvall7
 
0.1%
Stephen Root7
 
0.1%
Sam Shepard7
 
0.1%
Jon Gries7
 
0.1%
Kirsten Dunst7
 
0.1%
Lois Maxwell7
 
0.1%
Anne Hathaway7
 
0.1%
Other values (3511)4947
98.1%
(Missing)23
 
0.5%

Length

2021-09-08T16:13:31.837861image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
michael86
 
0.8%
john80
 
0.8%
david70
 
0.7%
james69
 
0.7%
robert46
 
0.4%
tom43
 
0.4%
paul42
 
0.4%
kevin41
 
0.4%
peter38
 
0.4%
steve36
 
0.3%
Other values (4307)9842
94.7%

Most occurring characters

ValueCountFrequency (%)
e6190
 
9.4%
a5995
 
9.1%
5373
 
8.2%
n4589
 
7.0%
r4183
 
6.4%
i3975
 
6.1%
o3584
 
5.5%
l3508
 
5.3%
t2354
 
3.6%
s2343
 
3.6%
Other values (71)23579
35.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter49295
75.1%
Uppercase Letter10690
 
16.3%
Space Separator5373
 
8.2%
Other Punctuation234
 
0.4%
Dash Punctuation79
 
0.1%
Decimal Number2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e6190
12.6%
a5995
12.2%
n4589
9.3%
r4183
 
8.5%
i3975
 
8.1%
o3584
 
7.3%
l3508
 
7.1%
t2354
 
4.8%
s2343
 
4.8%
h1857
 
3.8%
Other values (34)10717
21.7%
Uppercase Letter
ValueCountFrequency (%)
M986
 
9.2%
J832
 
7.8%
S830
 
7.8%
B806
 
7.5%
C792
 
7.4%
D653
 
6.1%
R615
 
5.8%
A589
 
5.5%
L536
 
5.0%
K464
 
4.3%
Other values (21)3587
33.6%
Other Punctuation
ValueCountFrequency (%)
.171
73.1%
'63
 
26.9%
Decimal Number
ValueCountFrequency (%)
51
50.0%
01
50.0%
Space Separator
ValueCountFrequency (%)
5373
100.0%
Dash Punctuation
ValueCountFrequency (%)
-79
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin59985
91.3%
Common5688
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e6190
 
10.3%
a5995
 
10.0%
n4589
 
7.7%
r4183
 
7.0%
i3975
 
6.6%
o3584
 
6.0%
l3508
 
5.8%
t2354
 
3.9%
s2343
 
3.9%
h1857
 
3.1%
Other values (65)21407
35.7%
Common
ValueCountFrequency (%)
5373
94.5%
.171
 
3.0%
-79
 
1.4%
'63
 
1.1%
51
 
< 0.1%
01
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII65537
99.8%
Latin 1 Sup136
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e6190
 
9.4%
a5995
 
9.1%
5373
 
8.2%
n4589
 
7.0%
r4183
 
6.4%
i3975
 
6.1%
o3584
 
5.5%
l3508
 
5.4%
t2354
 
3.6%
s2343
 
3.6%
Other values (48)23443
35.8%
Latin 1 Sup
ValueCountFrequency (%)
é49
36.0%
í14
 
10.3%
á13
 
9.6%
ó9
 
6.6%
ë7
 
5.1%
ü7
 
5.1%
à6
 
4.4%
è4
 
2.9%
ç3
 
2.2%
å3
 
2.2%
Other values (13)21
15.4%

facenumber_in_poster
Real number (ℝ≥0)

ZEROS

Distinct19
Distinct (%)0.4%
Missing13
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean1.371172962
Minimum0
Maximum43
Zeros2152
Zeros (%)42.7%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:31.947212image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum43
Range43
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.01357592
Coefficient of variation (CV)1.468506144
Kurtosis52.03373533
Mean1.371172962
Median Absolute Deviation (MAD)1
Skewness4.384765939
Sum6897
Variance4.054487986
MonotonicityNot monotonic
2021-09-08T16:13:32.056560image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
02152
42.7%
11251
24.8%
2716
 
14.2%
3380
 
7.5%
4207
 
4.1%
5114
 
2.3%
676
 
1.5%
748
 
1.0%
837
 
0.7%
918
 
0.4%
Other values (9)31
 
0.6%
(Missing)13
 
0.3%
ValueCountFrequency (%)
02152
42.7%
11251
24.8%
2716
 
14.2%
3380
 
7.5%
4207
 
4.1%
5114
 
2.3%
676
 
1.5%
748
 
1.0%
837
 
0.7%
918
 
0.4%
ValueCountFrequency (%)
431
 
< 0.1%
311
 
< 0.1%
191
 
< 0.1%
156
 
0.1%
141
 
< 0.1%
132
 
< 0.1%
124
 
0.1%
115
 
0.1%
1010
0.2%
918
0.4%

plot_keywords
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct4760
Distinct (%)97.3%
Missing153
Missing (%)3.0%
Memory size527.5 KiB
based on novel
 
4
assistant|experiment|frankenstein|medical student|scientist
 
3
one word title
 
3
alien friendship|alien invasion|australia|flying car|mother daughter relationship
 
3
1940s|child hero|fantasy world|orphan|reference to peter pan
 
3
Other values (4755)
4874 

Length

Max length149
Median length50
Mean length52.43312883
Min length2

Characters and Unicode

Total characters256398
Distinct characters42
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4639 ?
Unique (%)94.9%

Sample

1st rowbullying|cyberbullying|girl|internet|throat slitting
2nd rowapartment|apartment building|blood sample|crucifix|zombie
3rd rowapartment building|character's point of view camera shot|fire station|subjective camera|television reporter
4th rowalien|bunker|car crash|kidnapping|minimal cast
5th rowdating|protective father|school|shrew|teen movie

Common Values

ValueCountFrequency (%)
based on novel4
 
0.1%
assistant|experiment|frankenstein|medical student|scientist3
 
0.1%
one word title3
 
0.1%
alien friendship|alien invasion|australia|flying car|mother daughter relationship3
 
0.1%
1940s|child hero|fantasy world|orphan|reference to peter pan3
 
0.1%
halloween|masked killer|michael myers|slasher|trick or treat3
 
0.1%
eighteen wheeler|illegal street racing|truck|trucker|undercover cop3
 
0.1%
animal name in title|ape abducts a woman|gorilla|island|king kong3
 
0.1%
ghost|haunted|haunting|house|paranormal investigator2
 
< 0.1%
famous line|hand to hand combat|kraken|rape|zeus2
 
< 0.1%
Other values (4750)4861
96.4%
(Missing)153
 
3.0%

Length

2021-09-08T16:13:32.375670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
in331
 
1.8%
of222
 
1.2%
on209
 
1.2%
the191
 
1.1%
a185
 
1.0%
to180
 
1.0%
york122
 
0.7%
based106
 
0.6%
female104
 
0.6%
by99
 
0.5%
Other values (11486)16269
90.3%

Most occurring characters

ValueCountFrequency (%)
e24818
 
9.7%
a19577
 
7.6%
|19207
 
7.5%
i18742
 
7.3%
r18124
 
7.1%
t16182
 
6.3%
n15662
 
6.1%
o15480
 
6.0%
s13297
 
5.2%
13128
 
5.1%
Other values (32)82181
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter222711
86.9%
Math Symbol19207
 
7.5%
Space Separator13128
 
5.1%
Decimal Number1131
 
0.4%
Other Punctuation219
 
0.1%
Open Punctuation1
 
< 0.1%
Close Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e24818
11.1%
a19577
 
8.8%
i18742
 
8.4%
r18124
 
8.1%
t16182
 
7.3%
n15662
 
7.0%
o15480
 
7.0%
s13297
 
6.0%
l11203
 
5.0%
c9463
 
4.2%
Other values (16)60163
27.0%
Decimal Number
ValueCountFrequency (%)
1284
25.1%
0270
23.9%
9222
19.6%
281
 
7.2%
865
 
5.7%
749
 
4.3%
547
 
4.2%
344
 
3.9%
638
 
3.4%
431
 
2.7%
Other Punctuation
ValueCountFrequency (%)
.130
59.4%
'89
40.6%
Math Symbol
ValueCountFrequency (%)
|19207
100.0%
Space Separator
ValueCountFrequency (%)
13128
100.0%
Open Punctuation
ValueCountFrequency (%)
(1
100.0%
Close Punctuation
ValueCountFrequency (%)
)1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin222711
86.9%
Common33687
 
13.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e24818
11.1%
a19577
 
8.8%
i18742
 
8.4%
r18124
 
8.1%
t16182
 
7.3%
n15662
 
7.0%
o15480
 
7.0%
s13297
 
6.0%
l11203
 
5.0%
c9463
 
4.2%
Other values (16)60163
27.0%
Common
ValueCountFrequency (%)
|19207
57.0%
13128
39.0%
1284
 
0.8%
0270
 
0.8%
9222
 
0.7%
.130
 
0.4%
'89
 
0.3%
281
 
0.2%
865
 
0.2%
749
 
0.1%
Other values (6)162
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII256398
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e24818
 
9.7%
a19577
 
7.6%
|19207
 
7.5%
i18742
 
7.3%
r18124
 
7.1%
t16182
 
6.3%
n15662
 
6.1%
o15480
 
6.0%
s13297
 
5.2%
13128
 
5.1%
Other values (32)82181
32.1%
Distinct4919
Distinct (%)97.5%
Missing0
Missing (%)0.0%
Memory size536.9 KiB
http://www.imdb.com/title/tt2224026/?ref_=fn_tt_tt_1
 
3
http://www.imdb.com/title/tt0360717/?ref_=fn_tt_tt_1
 
3
http://www.imdb.com/title/tt1976009/?ref_=fn_tt_tt_1
 
3
http://www.imdb.com/title/tt2638144/?ref_=fn_tt_tt_1
 
3
http://www.imdb.com/title/tt0232500/?ref_=fn_tt_tt_1
 
3
Other values (4914)
5028 
ValueCountFrequency (%)
http://www.imdb.com/title/tt2224026/?ref_=fn_tt_tt_13
 
0.1%
http://www.imdb.com/title/tt0360717/?ref_=fn_tt_tt_13
 
0.1%
http://www.imdb.com/title/tt1976009/?ref_=fn_tt_tt_13
 
0.1%
http://www.imdb.com/title/tt2638144/?ref_=fn_tt_tt_13
 
0.1%
http://www.imdb.com/title/tt0232500/?ref_=fn_tt_tt_13
 
0.1%
http://www.imdb.com/title/tt0077651/?ref_=fn_tt_tt_13
 
0.1%
http://www.imdb.com/title/tt3332064/?ref_=fn_tt_tt_13
 
0.1%
http://www.imdb.com/title/tt0467406/?ref_=fn_tt_tt_12
 
< 0.1%
http://www.imdb.com/title/tt1502712/?ref_=fn_tt_tt_12
 
< 0.1%
http://www.imdb.com/title/tt0082517/?ref_=fn_tt_tt_12
 
< 0.1%
Other values (4909)5016
99.5%
ValueCountFrequency (%)
http5043
100.0%
ValueCountFrequency (%)
www.imdb.com5043
100.0%
ValueCountFrequency (%)
/title/tt0077651/3
 
0.1%
/title/tt3332064/3
 
0.1%
/title/tt2224026/3
 
0.1%
/title/tt0232500/3
 
0.1%
/title/tt1976009/3
 
0.1%
/title/tt0360717/3
 
0.1%
/title/tt2638144/3
 
0.1%
/title/tt1939659/2
 
< 0.1%
/title/tt0363547/2
 
< 0.1%
/title/tt0443543/2
 
< 0.1%
Other values (4909)5016
99.5%
ValueCountFrequency (%)
ref_=fn_tt_tt_15043
100.0%
ValueCountFrequency (%)
5043
100.0%

num_user_for_reviews
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct954
Distinct (%)19.0%
Missing21
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean272.7708084
Minimum1
Maximum5060
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:32.531112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile10
Q165
median156
Q3326
95-th percentile907.8
Maximum5060
Range5059
Interquartile range (IQR)261

Descriptive statistics

Standard deviation377.9828856
Coefficient of variation (CV)1.385716044
Kurtosis26.43829739
Mean272.7708084
Median Absolute Deviation (MAD)113
Skewness4.121475159
Sum1369855
Variance142871.0618
MonotonicityNot monotonic
2021-09-08T16:13:32.661782image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
151
 
1.0%
333
 
0.7%
232
 
0.6%
2632
 
0.6%
1029
 
0.6%
628
 
0.6%
5026
 
0.5%
825
 
0.5%
3225
 
0.5%
3124
 
0.5%
Other values (944)4717
93.5%
ValueCountFrequency (%)
151
1.0%
232
0.6%
333
0.7%
423
0.5%
519
 
0.4%
628
0.6%
717
 
0.3%
825
0.5%
923
0.5%
1029
0.6%
ValueCountFrequency (%)
50601
< 0.1%
46671
< 0.1%
41441
< 0.1%
36461
< 0.1%
35971
< 0.1%
35161
< 0.1%
34001
< 0.1%
32861
< 0.1%
31891
< 0.1%
30541
< 0.1%

language
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct47
Distinct (%)0.9%
Missing12
Missing (%)0.2%
Memory size314.8 KiB
English
4704 
French
 
73
Spanish
 
40
Hindi
 
28
Mandarin
 
26
Other values (42)
 
160

Length

Max length10
Median length7
Mean length6.980719539
Min length4

Characters and Unicode

Total characters35120
Distinct characters43
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)0.4%

Sample

1st rowEnglish
2nd rowSpanish
3rd rowSpanish
4th rowEnglish
5th rowEnglish

Common Values

ValueCountFrequency (%)
English4704
93.3%
French73
 
1.4%
Spanish40
 
0.8%
Hindi28
 
0.6%
Mandarin26
 
0.5%
German19
 
0.4%
Japanese18
 
0.4%
Cantonese11
 
0.2%
Russian11
 
0.2%
Italian11
 
0.2%
Other values (37)90
 
1.8%
(Missing)12
 
0.2%

Length

2021-09-08T16:13:32.935607image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
english4704
93.5%
french73
 
1.5%
spanish40
 
0.8%
hindi28
 
0.6%
mandarin26
 
0.5%
german19
 
0.4%
japanese18
 
0.4%
cantonese11
 
0.2%
italian11
 
0.2%
russian11
 
0.2%
Other values (37)90
 
1.8%

Most occurring characters

ValueCountFrequency (%)
n5032
14.3%
i4906
14.0%
h4845
13.8%
s4828
13.7%
l4731
13.5%
g4722
13.4%
E4704
13.4%
a252
 
0.7%
e217
 
0.6%
r160
 
0.5%
Other values (33)723
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter30089
85.7%
Uppercase Letter5031
 
14.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n5032
16.7%
i4906
16.3%
h4845
16.1%
s4828
16.0%
l4731
15.7%
g4722
15.7%
a252
 
0.8%
e217
 
0.7%
r160
 
0.5%
c88
 
0.3%
Other values (13)308
 
1.0%
Uppercase Letter
ValueCountFrequency (%)
E4704
93.5%
F74
 
1.5%
S47
 
0.9%
H34
 
0.7%
M28
 
0.6%
G20
 
0.4%
J18
 
0.4%
P17
 
0.3%
C15
 
0.3%
I15
 
0.3%
Other values (10)59
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
Latin35120
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n5032
14.3%
i4906
14.0%
h4845
13.8%
s4828
13.7%
l4731
13.5%
g4722
13.4%
E4704
13.4%
a252
 
0.7%
e217
 
0.6%
r160
 
0.5%
Other values (33)723
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII35120
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n5032
14.3%
i4906
14.0%
h4845
13.8%
s4828
13.7%
l4731
13.5%
g4722
13.4%
E4704
13.4%
a252
 
0.7%
e217
 
0.6%
r160
 
0.5%
Other values (33)723
 
2.1%

country
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct63
Distinct (%)1.3%
Missing5
Missing (%)0.1%
Memory size297.9 KiB
USA
3809 
UK
448 
France
 
154
Canada
 
126
Germany
 
97
Other values (58)
404 

Length

Max length20
Median length3
Mean length3.486304089
Min length2

Characters and Unicode

Total characters17564
Distinct characters46
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)0.5%

Sample

1st rowUSA
2nd rowSpain
3rd rowSpain
4th rowUSA
5th rowUSA

Common Values

ValueCountFrequency (%)
USA3809
75.5%
UK448
 
8.9%
France154
 
3.1%
Canada126
 
2.5%
Germany97
 
1.9%
Australia55
 
1.1%
India34
 
0.7%
Spain33
 
0.7%
China30
 
0.6%
Italy23
 
0.5%
Other values (53)229
 
4.5%

Length

2021-09-08T16:13:33.216755image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
usa3809
74.7%
uk448
 
8.8%
france154
 
3.0%
canada126
 
2.5%
germany100
 
2.0%
australia55
 
1.1%
india34
 
0.7%
spain33
 
0.6%
china30
 
0.6%
japan23
 
0.5%
Other values (60)290
 
5.7%

Most occurring characters

ValueCountFrequency (%)
U4259
24.2%
A3879
22.1%
S3876
22.1%
a1092
 
6.2%
n637
 
3.6%
K481
 
2.7%
e407
 
2.3%
r404
 
2.3%
i245
 
1.4%
d218
 
1.2%
Other values (36)2066
11.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter13168
75.0%
Lowercase Letter4332
 
24.7%
Space Separator64
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1092
25.2%
n637
14.7%
e407
 
9.4%
r404
 
9.3%
i245
 
5.7%
d218
 
5.0%
c192
 
4.4%
l153
 
3.5%
y139
 
3.2%
m126
 
2.9%
Other values (14)719
16.6%
Uppercase Letter
ValueCountFrequency (%)
U4259
32.3%
A3879
29.5%
S3876
29.4%
K481
 
3.7%
C163
 
1.2%
F155
 
1.2%
G103
 
0.8%
I81
 
0.6%
N29
 
0.2%
J23
 
0.2%
Other values (11)119
 
0.9%
Space Separator
ValueCountFrequency (%)
64
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin17500
99.6%
Common64
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
U4259
24.3%
A3879
22.2%
S3876
22.1%
a1092
 
6.2%
n637
 
3.6%
K481
 
2.7%
e407
 
2.3%
r404
 
2.3%
i245
 
1.4%
d218
 
1.2%
Other values (35)2002
11.4%
Common
ValueCountFrequency (%)
64
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII17564
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U4259
24.2%
A3879
22.1%
S3876
22.1%
a1092
 
6.2%
n637
 
3.6%
K481
 
2.7%
e407
 
2.3%
r404
 
2.3%
i245
 
1.4%
d218
 
1.2%
Other values (36)2066
11.8%

content_rating
Categorical

HIGH CORRELATION
MISSING

Distinct18
Distinct (%)0.4%
Missing303
Missing (%)6.0%
Memory size286.5 KiB
R
2118 
PG-13
1461 
PG
701 
Not Rated
 
116
G
 
112
Other values (13)
232 

Length

Max length9
Median length2
Mean length2.813924051
Min length1

Characters and Unicode

Total characters13338
Distinct characters28
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowNot Rated
2nd rowR
3rd rowR
4th rowPG-13
5th rowR

Common Values

ValueCountFrequency (%)
R2118
42.0%
PG-131461
29.0%
PG701
 
13.9%
Not Rated116
 
2.3%
G112
 
2.2%
Unrated62
 
1.2%
Approved55
 
1.1%
TV-1430
 
0.6%
TV-MA20
 
0.4%
TV-PG13
 
0.3%
Other values (8)52
 
1.0%
(Missing)303
 
6.0%

Length

2021-09-08T16:13:33.482319image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
r2118
43.6%
pg-131461
30.1%
pg701
 
14.4%
not116
 
2.4%
rated116
 
2.4%
g112
 
2.3%
unrated62
 
1.3%
approved55
 
1.1%
tv-1430
 
0.6%
tv-ma20
 
0.4%
Other values (9)65
 
1.3%

Most occurring characters

ValueCountFrequency (%)
G2303
17.3%
R2234
16.7%
P2190
16.4%
-1543
11.6%
11498
11.2%
31461
11.0%
t294
 
2.2%
e242
 
1.8%
d242
 
1.8%
a187
 
1.4%
Other values (18)1144
8.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter7184
53.9%
Decimal Number2997
22.5%
Dash Punctuation1543
 
11.6%
Lowercase Letter1498
 
11.2%
Space Separator116
 
0.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
G2303
32.1%
R2234
31.1%
P2190
30.5%
N123
 
1.7%
T75
 
1.0%
V75
 
1.0%
A75
 
1.0%
U62
 
0.9%
M25
 
0.3%
X13
 
0.2%
Other values (2)9
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
t294
19.6%
e242
16.2%
d242
16.2%
a187
12.5%
o171
11.4%
r117
 
7.8%
p110
 
7.3%
n62
 
4.1%
v55
 
3.7%
s18
 
1.2%
Decimal Number
ValueCountFrequency (%)
11498
50.0%
31461
48.7%
430
 
1.0%
78
 
0.3%
Space Separator
ValueCountFrequency (%)
116
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1543
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8682
65.1%
Common4656
34.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
G2303
26.5%
R2234
25.7%
P2190
25.2%
t294
 
3.4%
e242
 
2.8%
d242
 
2.8%
a187
 
2.2%
o171
 
2.0%
N123
 
1.4%
r117
 
1.3%
Other values (12)579
 
6.7%
Common
ValueCountFrequency (%)
-1543
33.1%
11498
32.2%
31461
31.4%
116
 
2.5%
430
 
0.6%
78
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII13338
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G2303
17.3%
R2234
16.7%
P2190
16.4%
-1543
11.6%
11498
11.2%
31461
11.0%
t294
 
2.2%
e242
 
1.8%
d242
 
1.8%
a187
 
1.4%
Other values (18)1144
8.6%

budget
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING
SKEWED

Distinct439
Distinct (%)9.6%
Missing492
Missing (%)9.8%
Infinite0
Infinite (%)0.0%
Mean39752620.44
Minimum218
Maximum1.22155 × 1010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:33.614359image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum218
5-th percentile500000
Q16000000
median20000000
Q345000000
95-th percentile130000000
Maximum1.22155 × 1010
Range1.221549978 × 1010
Interquartile range (IQR)39000000

Descriptive statistics

Standard deviation206114898.4
Coefficient of variation (CV)5.184938658
Kurtosis2724.257433
Mean39752620.44
Median Absolute Deviation (MAD)16000000
Skewness48.15743539
Sum1.809141756 × 1011
Variance4.248335136 × 1016
MonotonicityNot monotonic
2021-09-08T16:13:33.751366image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20000000174
 
3.5%
15000000143
 
2.8%
25000000142
 
2.8%
30000000141
 
2.8%
10000000135
 
2.7%
40000000131
 
2.6%
35000000120
 
2.4%
5000000111
 
2.2%
50000000101
 
2.0%
1200000092
 
1.8%
Other values (429)3261
64.7%
(Missing)492
 
9.8%
ValueCountFrequency (%)
2181
 
< 0.1%
11001
 
< 0.1%
14001
 
< 0.1%
32501
 
< 0.1%
45001
 
< 0.1%
70003
0.1%
90001
 
< 0.1%
100003
0.1%
130001
 
< 0.1%
140001
 
< 0.1%
ValueCountFrequency (%)
1.22155 × 10101
< 0.1%
42000000001
< 0.1%
25000000001
< 0.1%
24000000001
< 0.1%
21275198981
< 0.1%
11000000001
< 0.1%
10000000001
< 0.1%
7000000002
< 0.1%
6000000001
< 0.1%
5536320001
< 0.1%

title_year
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct91
Distinct (%)1.8%
Missing108
Missing (%)2.1%
Infinite0
Infinite (%)0.0%
Mean2002.470517
Minimum1916
Maximum2016
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:33.886733image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1916
5-th percentile1979
Q11999
median2005
Q32011
95-th percentile2015
Maximum2016
Range100
Interquartile range (IQR)12

Descriptive statistics

Standard deviation12.47459892
Coefficient of variation (CV)0.006229604289
Kurtosis7.439212616
Mean2002.470517
Median Absolute Deviation (MAD)6
Skewness-2.29227335
Sum9882192
Variance155.6156182
MonotonicityNot monotonic
2021-09-08T16:13:34.027325image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2009260
 
5.2%
2014252
 
5.0%
2006239
 
4.7%
2013237
 
4.7%
2010230
 
4.6%
2015226
 
4.5%
2008225
 
4.5%
2011225
 
4.5%
2005221
 
4.4%
2012221
 
4.4%
Other values (81)2599
51.5%
ValueCountFrequency (%)
19161
< 0.1%
19201
< 0.1%
19251
< 0.1%
19271
< 0.1%
19292
< 0.1%
19301
< 0.1%
19321
< 0.1%
19332
< 0.1%
19341
< 0.1%
19351
< 0.1%
ValueCountFrequency (%)
2016106
2.1%
2015226
4.5%
2014252
5.0%
2013237
4.7%
2012221
4.4%
2011225
4.5%
2010230
4.6%
2009260
5.2%
2008225
4.5%
2007204
4.0%

actor_2_facebook_likes
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct917
Distinct (%)18.2%
Missing13
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean1651.754473
Minimum0
Maximum137000
Zeros55
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:34.152296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile26
Q1281
median595
Q3918
95-th percentile11000
Maximum137000
Range137000
Interquartile range (IQR)637

Descriptive statistics

Standard deviation4042.438863
Coefficient of variation (CV)2.447360627
Kurtosis256.7951889
Mean1651.754473
Median Absolute Deviation (MAD)317
Skewness9.884733179
Sum8308325
Variance16341311.96
MonotonicityNot monotonic
2021-09-08T16:13:34.277267image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1000309
 
6.1%
11000111
 
2.2%
2000100
 
2.0%
300076
 
1.5%
055
 
1.1%
1000047
 
0.9%
1400041
 
0.8%
1300040
 
0.8%
82637
 
0.7%
400034
 
0.7%
Other values (907)4180
82.9%
ValueCountFrequency (%)
055
1.1%
214
 
0.3%
314
 
0.3%
412
 
0.2%
510
 
0.2%
67
 
0.1%
74
 
0.1%
89
 
0.2%
913
 
0.3%
109
 
0.2%
ValueCountFrequency (%)
1370001
 
< 0.1%
290001
 
< 0.1%
270002
 
< 0.1%
250003
 
0.1%
230006
0.1%
2200011
0.2%
210004
 
0.1%
200006
0.1%
190007
0.1%
180009
0.2%

imdb_score
Real number (ℝ≥0)

HIGH CORRELATION

Distinct78
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.442137616
Minimum1.6
Maximum9.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:34.417859image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1.6
5-th percentile4.4
Q15.8
median6.6
Q37.2
95-th percentile8.09
Maximum9.5
Range7.9
Interquartile range (IQR)1.4

Descriptive statistics

Standard deviation1.125115866
Coefficient of variation (CV)0.1746494615
Kurtosis0.9356915064
Mean6.442137616
Median Absolute Deviation (MAD)0.7
Skewness-0.7414713363
Sum32487.7
Variance1.265885711
MonotonicityNot monotonic
2021-09-08T16:13:34.542829image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.7223
 
4.4%
6.6201
 
4.0%
7.2195
 
3.9%
6.5186
 
3.7%
6.4185
 
3.7%
7184
 
3.6%
7.3184
 
3.6%
6.8181
 
3.6%
7.1181
 
3.6%
6.1179
 
3.5%
Other values (68)3144
62.3%
ValueCountFrequency (%)
1.61
 
< 0.1%
1.71
 
< 0.1%
1.93
0.1%
22
< 0.1%
2.13
0.1%
2.23
0.1%
2.33
0.1%
2.42
< 0.1%
2.52
< 0.1%
2.62
< 0.1%
ValueCountFrequency (%)
9.51
 
< 0.1%
9.31
 
< 0.1%
9.21
 
< 0.1%
9.13
 
0.1%
93
 
0.1%
8.95
 
0.1%
8.87
 
0.1%
8.713
0.3%
8.615
0.3%
8.524
0.5%

aspect_ratio
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct22
Distinct (%)0.5%
Missing329
Missing (%)6.5%
Infinite0
Infinite (%)0.0%
Mean2.220403055
Minimum1.18
Maximum16
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:34.667800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1.18
5-th percentile1.66
Q11.85
median2.35
Q32.35
95-th percentile2.35
Maximum16
Range14.82
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation1.385112535
Coefficient of variation (CV)0.6238113087
Kurtosis90.65322055
Mean2.220403055
Median Absolute Deviation (MAD)0
Skewness9.390056312
Sum10466.98
Variance1.918536735
MonotonicityNot monotonic
2021-09-08T16:13:34.777149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
2.352360
46.8%
1.851906
37.8%
1.78110
 
2.2%
1.37100
 
2.0%
1.3368
 
1.3%
1.6664
 
1.3%
1645
 
0.9%
2.215
 
0.3%
2.3915
 
0.3%
47
 
0.1%
Other values (12)24
 
0.5%
(Missing)329
 
6.5%
ValueCountFrequency (%)
1.181
 
< 0.1%
1.21
 
< 0.1%
1.3368
1.3%
1.37100
2.0%
1.441
 
< 0.1%
1.52
 
< 0.1%
1.6664
1.3%
1.753
 
0.1%
1.771
 
< 0.1%
1.78110
2.2%
ValueCountFrequency (%)
1645
 
0.9%
47
 
0.1%
2.763
 
0.1%
2.552
 
< 0.1%
2.43
 
0.1%
2.3915
 
0.3%
2.352360
46.8%
2.241
 
< 0.1%
2.215
 
0.3%
25
 
0.1%

movie_facebook_likes
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct876
Distinct (%)17.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7525.964505
Minimum0
Maximum349000
Zeros2181
Zeros (%)43.2%
Negative0
Negative (%)0.0%
Memory size39.5 KiB
2021-09-08T16:13:34.916927image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median166
Q33000
95-th percentile40000
Maximum349000
Range349000
Interquartile range (IQR)3000

Descriptive statistics

Standard deviation19320.44511
Coefficient of variation (CV)2.567171968
Kurtosis41.33443692
Mean7525.964505
Median Absolute Deviation (MAD)166
Skewness5.05892689
Sum37953439
Variance373279599.2
MonotonicityNot monotonic
2021-09-08T16:13:35.047135image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02181
43.2%
1000109
 
2.2%
1100083
 
1.6%
1000081
 
1.6%
1200062
 
1.2%
1300058
 
1.2%
200056
 
1.1%
1500053
 
1.1%
1400050
 
1.0%
1600047
 
0.9%
Other values (866)2263
44.9%
ValueCountFrequency (%)
02181
43.2%
22
 
< 0.1%
31
 
< 0.1%
45
 
0.1%
52
 
< 0.1%
73
 
0.1%
81
 
< 0.1%
93
 
0.1%
102
 
< 0.1%
112
 
< 0.1%
ValueCountFrequency (%)
3490001
< 0.1%
1990001
< 0.1%
1970001
< 0.1%
1910001
< 0.1%
1900001
< 0.1%
1750001
< 0.1%
1660001
< 0.1%
1650001
< 0.1%
1640001
< 0.1%
1530001
< 0.1%

Interactions

2021-09-08T16:12:48.641390image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:48.813890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:48.937907image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:49.079599image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:49.220191image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:49.360783image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:49.501376image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:49.638024image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:49.785595image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:49.910564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:50.051158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:50.191749image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:50.347115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:50.477324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:50.633101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:50.778960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:50.919553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:51.044523image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:51.169494image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:51.294465image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:51.403813image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:51.528785image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:51.777977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:51.949812image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:52.106025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:52.246616image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:52.389572image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:52.545784image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:52.707240image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:52.847798image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:53.018409image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:53.175653image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:53.323863image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:53.485385image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:53.625973image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:53.766600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:53.907192image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:54.047755image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:54.206883image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:54.333909image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:54.445387image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:54.554736image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:54.679226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:54.793835image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:54.918808image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:55.028157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:55.153129image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:55.278099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:55.418693image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:55.543664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:55.667859image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:55.782465image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:55.891815image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:56.001163image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:56.141757image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:56.266727image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:56.391698image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:56.501048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:56.610396image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:56.735368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:56.985311image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:57.125280image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:57.255495image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:57.380465image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:57.505435image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:57.646028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:57.770998image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:57.880347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:58.005318image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:58.145910image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:58.270880image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:58.426365image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:58.556628image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:58.681598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:58.806570image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:58.931540image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:59.056510image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:59.212724image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:59.353317image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:59.495184image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:59.616621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:59.757211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:12:59.882182image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:00.031088image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:00.152511image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:00.294207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:00.422685image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:00.578902image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:00.718894image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:00.849216image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:00.974182image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:01.114774image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:01.255367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:01.395960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:01.536551image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:01.678484image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:01.819077image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:01.959629image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:02.100225image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:02.225195image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:02.365788image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:02.506380image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:02.662593image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:02.818806image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:03.131235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:03.271824image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:03.412418image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:03.553010image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:03.693602image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:03.834194image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:03.974786image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:04.115378image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:04.255970image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:04.396561image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:04.521531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:04.646502image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:04.771473image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:04.912066image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:05.052658image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:05.177628image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:05.318221image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:05.458813image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:05.583784image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:05.735196image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:05.853964image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:05.994560image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:06.135154image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:06.275746image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:06.416338image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:06.541308image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:06.650657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:06.760007image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:06.869356image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:06.994327image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:07.119298image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:07.244269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:07.369239image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:07.494210image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:07.603560image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:07.728567image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:07.837915image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:07.962886image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:08.072234image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:08.197206image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:08.322172image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:08.447142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:08.556493image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:08.681465image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:08.790812image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:08.915785image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:09.040758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:09.165732image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:09.290694image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:09.415671image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:09.525022image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:09.649992image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:09.774964image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:09.899935image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:10.024906image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:10.352975image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:10.509355image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:10.655553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:10.789110image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:10.910331image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:11.035307image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:11.175900image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:11.316492image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:11.441461image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:11.582055image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:11.707026image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:11.831996image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:11.972588image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:12.097558image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:12.238151image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:12.378743image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:12.503713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:12.644306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:12.784898image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:12.894245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:13.019218image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:13.144187image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:13.269159image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:13.409752image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:13.550344image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:13.675315image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:13.784663image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:13.909634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:14.050237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:14.159576image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:14.300168image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:14.425139image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:14.550109image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:14.690701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:14.846283image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:14.976562image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:15.117154image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:15.257746image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:15.382717image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:15.523309image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:15.679523image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:15.819056image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:15.944076image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:16.074285image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:16.213881image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:16.344100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:16.484692image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:16.625285image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:16.768130image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:16.908721image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:17.049314image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:17.174283image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:17.299253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:17.424226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:17.549193image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:17.689785image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:17.830376image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:17.955348image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:18.080320image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:18.205289image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:18.361506image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:18.486476image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:18.611445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:18.752037image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:18.877007image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:19.017600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:19.408157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:19.564351image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:19.689321image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:19.814292image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:19.939262image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:20.079856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:20.236064image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:20.415856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:20.569174image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:20.756631image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:20.912601image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:21.043151image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:21.168119image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:21.308713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:21.449301image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:21.574271image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:21.730486image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:21.855457image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:21.980428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:22.105398image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:22.245995image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:22.386583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:22.527175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:22.667769image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:22.792740image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:22.917711image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:23.058303image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:23.183272image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:23.323863image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:23.464458image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-09-08T16:13:23.605049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-09-08T16:13:35.201449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-08T16:13:35.498255image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-08T16:13:35.795095image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-08T16:13:36.090737image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-09-08T16:13:36.388655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-09-08T16:13:23.901147image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-08T16:13:25.454584image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-09-08T16:13:26.047424image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-09-08T16:13:26.936324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenresactor_1_namemovie_titlenum_voted_userscast_total_facebook_likesactor_3_namefacenumber_in_posterplot_keywordsmovie_imdb_linknum_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
0ColorTara Subkoff35.0101.037.056.0Balthazar Getty501.0NaNDrama|Horror|Mystery|ThrillerTimothy Hutton#Horror15471044Lydia Hearst1.0bullying|cyberbullying|girl|internet|throat slittinghttp://www.imdb.com/title/tt3526286/?ref_=fn_tt_tt_142.0EnglishUSANot Rated1500000.02015.0418.03.3NaN750
1ColorJaume Balagueró222.085.057.06.0Pablo Rosso37.027024.0HorrorJonathan D. Mellor[Rec] 25559773Andrea Ros0.0apartment|apartment building|blood sample|crucifix|zombiehttp://www.imdb.com/title/tt1245112/?ref_=fn_tt_tt_1148.0SpanishSpainR5600000.02009.09.06.61.854000
2ColorJaume Balagueró252.078.057.07.0Pablo Rosso120.0NaNHorrorManuela Velasco[Rec]131462145Carlos Lasarte0.0apartment building|character's point of view camera shot|fire station|subjective camera|television reporterhttp://www.imdb.com/title/tt1038988/?ref_=fn_tt_tt_1374.0SpanishSpainR1500000.02007.09.07.51.8515000
3ColorDan Trachtenberg411.0104.016.082.0John Gallagher Jr.14000.071897215.0Drama|Horror|Mystery|Sci-Fi|ThrillerBradley Cooper10 Cloverfield Lane12689314504Sumalee Montano0.0alien|bunker|car crash|kidnapping|minimal casthttp://www.imdb.com/title/tt1179933/?ref_=fn_tt_tt_1440.0EnglishUSAPG-1315000000.02016.0338.07.32.3533000
4ColorTimothy Hines1.0111.00.0247.0Kelly LeBrock1000.014616.0DramaChristopher Lambert10 Days in a Madhouse3142059Alexandra Callas1.0NaNhttp://www.imdb.com/title/tt3453052/?ref_=fn_tt_tt_110.0EnglishUSAR12000000.02015.0445.07.51.8526000
5ColorGil Junger133.097.019.0835.0Heath Ledger23000.038176108.0Comedy|Drama|RomanceJoseph Gordon-Levitt10 Things I Hate About You22209937907Andrew Keegan6.0dating|protective father|school|shrew|teen moviehttp://www.imdb.com/title/tt0147800/?ref_=fn_tt_tt_1549.0EnglishUSAPG-1316000000.01999.013000.07.21.8510000
6NaNChristopher BarnardNaN22.00.0NaNNaN5.0NaNComedyMathew Buck10,000 B.C.65NaN0.0NaNhttp://www.imdb.com/title/tt1869849/?ref_=fn_tt_tt_1NaNNaNNaNNaNNaNNaNNaN7.2NaN0
7ColorKevin Lima84.0100.036.0439.0Eric Idle2000.066941559.0Adventure|Comedy|FamilyIoan Gruffudd102 Dalmatians264134182Jim Carter1.0dog|parole|parole officer|prison|puppyhttp://www.imdb.com/title/tt0211181/?ref_=fn_tt_tt_177.0EnglishUSAG85000000.02000.0795.04.81.85372
8ColorRobert Moresco26.0107.053.0463.0Brad Renfro954.053481.0Crime|Drama|ThrillerBrian Dennehy10th & Wolf55572512Dash Mihok5.0desert storm|fbi|fbi agent|fragmentation grenade|woman kills attackerhttp://www.imdb.com/title/tt0360323/?ref_=fn_tt_tt_134.0EnglishUSAR8000000.02006.0551.06.42.35294
9ColorGreg Marcks68.085.09.0407.0Barbara Hershey861.0NaNComedy|Crime|DramaHenry Thomas11:14382732200Shawn Hatosy1.0convenience store|multiple perspectives|murder|paramedic|vanhttp://www.imdb.com/title/tt0331811/?ref_=fn_tt_tt_1133.0EnglishUSAR6000000.02003.0618.07.21.850

Last rows

colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenresactor_1_namemovie_titlenum_voted_userscast_total_facebook_likesactor_3_namefacenumber_in_posterplot_keywordsmovie_imdb_linknum_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes
5033ColorMora Stephens35.0103.05.0842.0Alexandra Breckenridge1000.0NaNDrama|ThrillerRay WinstoneZipper40913408Elena Satine0.0escort|f word|no opening credits|one word title|prosecutorhttp://www.imdb.com/title/tt3346224/?ref_=fn_tt_tt_120.0EnglishUSAR4500000.02015.01000.05.72.35987
5034ColorKevin Hamedani64.089.023.023.0Janette Armand199.0NaNComedy|Horror|Sci-FiRussell HodgkinsonZMD: Zombies of Mass Destruction3650292Kevin Hamedani0.0cult|homosexual|island|survival horror|zombiehttp://www.imdb.com/title/tt1134674/?ref_=fn_tt_tt_139.0EnglishUSAR500000.02009.037.05.11.850
5035ColorDavid Fincher377.0162.021000.0495.0Jake Gyllenhaal21000.033048353.0Crime|Drama|History|Mystery|ThrillerRobert Downey Jr.Zodiac30127936928Anthony Edwards0.0cartoonist|reporter|serial killer|zodiac|zodiac killerhttp://www.imdb.com/title/tt0443706/?ref_=fn_tt_tt_1589.0EnglishUSAR65000000.02007.015000.07.72.3512000
5036ColorK. King150.093.03.0115.0Shona Kay214.0NaNAction|Comedy|HorrorJason K. WixomZombie Hunter2057656Jarrod Phillips2.0desert|drifter|seduction|siege|zombiehttp://www.imdb.com/title/tt2446502/?ref_=fn_tt_tt_130.0EnglishUSANot Rated1000000.02013.0211.03.52.350
5037ColorRuben Fleischer445.088.0181.011.0Bill Murray15000.075590286.0Adventure|Comedy|Horror|Sci-FiEmma StoneZombieland38621728011Derek Graf4.0amusement park|on the road|zombie|zombie apocalypse|zombie spoofhttp://www.imdb.com/title/tt1156398/?ref_=fn_tt_tt_1553.0EnglishUSAR23600000.02009.013000.07.72.3526000
5038ColorFrank Coraci178.0102.0153.0269.0Leslie Bibb3000.080360866.0Comedy|Family|RomanceRosario DawsonZookeeper446625392Nicholas Turturro1.0champagne bottle|coca cola|jewelry box|red bull|zoohttp://www.imdb.com/title/tt1222817/?ref_=fn_tt_tt_1127.0EnglishUSAPG80000000.02011.01000.05.22.350
5039ColorBen Stiller226.0102.00.01000.0Will Ferrell14000.028837115.0ComedyMilla JovovichZoolander 23496424107Justin Theroux4.0chosen one|fashion|fashion model|model|retiredhttp://www.imdb.com/title/tt1608290/?ref_=fn_tt_tt_1150.0EnglishUSAPG-1350000000.02016.08000.04.82.3528000
5040ColorBen Stiller135.090.00.08000.0Alexander Skarsgård14000.045162741.0ComedyMilla JovovichZoolander20108434565Will Ferrell0.0fashion|malaysia|male model|reporter|rivalhttp://www.imdb.com/title/tt0196229/?ref_=fn_tt_tt_1523.0EnglishGermanyPG-1328000000.02001.010000.06.62.350
5041ColorPeter Hewitt63.083.012.0690.0Rip Torn2000.011631245.0Action|Adventure|Family|Sci-FiKevin ZegersZoom150155022Thomas F. Wilson5.0bruise|female hero|super strength|superhero|teenage superherohttp://www.imdb.com/title/tt0383060/?ref_=fn_tt_tt_1113.0EnglishUSAPG35000000.02006.0826.04.21.85494
5042ColorJérôme Salle69.0110.022.044.0Tanya van Graan5000.0NaNCrime|Drama|ThrillerOrlando BloomZulu128175273Conrad Kemp0.0apartheid|corpse|male nudity|murder|police officerhttp://www.imdb.com/title/tt2249221/?ref_=fn_tt_tt_143.0EnglishFranceR16000000.02013.0170.06.72.350

Duplicate rows

Most frequently occurring

colordirector_namenum_critic_for_reviewsdurationdirector_facebook_likesactor_3_facebook_likesactor_2_nameactor_1_facebook_likesgrossgenresactor_1_namemovie_titlenum_voted_userscast_total_facebook_likesactor_3_namefacenumber_in_posterplot_keywordsmovie_imdb_linknum_user_for_reviewslanguagecountrycontent_ratingbudgettitle_yearactor_2_facebook_likesimdb_scoreaspect_ratiomovie_facebook_likes# duplicates
0Black and WhiteYimou Zhang283.080.0611.0576.0Tony Chiu Wai Leung5000.084961.0Action|Adventure|HistoryJet LiHero1494146229Maggie Cheung4.0china|flying|king|palace|swordhttp://www.imdb.com/title/tt0299977/?ref_=fn_tt_tt_1841.0MandarinChinaPG-1331000000.02002.0643.07.92.3502
1ColorAlbert Hughes208.0122.0117.0140.0Jason Flemyng40000.031598308.0Horror|Mystery|ThrillerJohnny DeppFrom Hell12476541636Ian Richardson1.0freemason|jack the ripper|opium|prostitute|victorian erahttp://www.imdb.com/title/tt0120681/?ref_=fn_tt_tt_1541.0EnglishUSAR35000000.02001.01000.06.82.3502
2ColorAngelina Jolie Pitt322.0137.011000.0465.0Jack O'Connell769.0115603980.0Biography|Drama|Sport|WarFinn WittrockUnbroken1035892938Alex Russell0.0emaciation|male nudity|plane crash|prisoner of war|torturehttp://www.imdb.com/title/tt1809398/?ref_=fn_tt_tt_1351.0EnglishUSAPG-1365000000.02014.0698.07.22.35350002
3ColorBill Condon322.0115.0386.012000.0Kristen Stewart21000.0292298923.0Adventure|Drama|Fantasy|RomanceRobert PattinsonThe Twilight Saga: Breaking Dawn - Part 218539459177Taylor Lautner3.0battle|friend|super strength|vampire|visionhttp://www.imdb.com/title/tt1673434/?ref_=fn_tt_tt_1329.0EnglishUSAPG-13120000000.02012.017000.05.52.35650002
4ColorBrett Ratner245.0101.0420.0467.0Rufus Sewell12000.072660029.0Action|AdventureDwayne JohnsonHercules11568716235Ingrid Bolsø Berdal0.0army|greek mythology|hercules|king|mercenaryhttp://www.imdb.com/title/tt1267297/?ref_=fn_tt_tt_1269.0EnglishUSAPG-13100000000.02014.03000.06.02.35210002
5ColorBruce McCulloch52.085.054.0455.0Megan Mullally985.013973532.0Comedy|CrimeMartin StarrStealing Harvard112113065Chris Penn1.0black humor|crying during sex|harvard|humor|man with glasseshttp://www.imdb.com/title/tt0265808/?ref_=fn_tt_tt_192.0EnglishUSAPG-1325000000.02002.0637.05.11.852152
6ColorDanny Boyle393.0101.00.0888.0Spencer Wilding3000.02319187.0Crime|Drama|Mystery|ThrillerRosario DawsonTrance926405056Tuppence Middleton0.0amnesia|criminal|heist|hypnotherapy|lost paintinghttp://www.imdb.com/title/tt1924429/?ref_=fn_tt_tt_1212.0EnglishUKR20000000.02013.01000.07.02.35230002
7ColorDavid Yates248.0110.0282.0103.0Alexander Skarsgård11000.0124051759.0Action|Adventure|Drama|RomanceChristoph WaltzThe Legend of Tarzan4237221175Casper Crump2.0africa|capture|jungle|male objectification|tarzanhttp://www.imdb.com/title/tt0918940/?ref_=fn_tt_tt_1239.0EnglishUSAPG-13180000000.02016.010000.06.62.35290002
8ColorFrank Oz168.087.00.0548.0Ewen Bremner22000.08579684.0ComedyPeter DinklageDeath at a Funeral8954724324Kris Marshall0.0end credits roll call|four word title|funeral|secret|unclehttp://www.imdb.com/title/tt0795368/?ref_=fn_tt_tt_1199.0EnglishUSAR9000000.02007.0557.07.41.8502
9ColorGuy Ritchie151.0104.00.01000.0Brad Pitt26000.030093107.0Comedy|CrimeJason StathamSnatch60099639175Jason Flemyng6.0boxer|boxing|diamond|fight|gypsyhttp://www.imdb.com/title/tt0208092/?ref_=fn_tt_tt_1726.0EnglishUKR6000000.02000.011000.08.31.85270002